Skip to content

Guard array_repeat list overflow#22295

Open
Sean-Kenneth-Doherty wants to merge 1 commit into
apache:mainfrom
Sean-Kenneth-Doherty:codex/array-repeat-list-overflow
Open

Guard array_repeat list overflow#22295
Sean-Kenneth-Doherty wants to merge 1 commit into
apache:mainfrom
Sean-Kenneth-Doherty:codex/array-repeat-list-overflow

Conversation

@Sean-Kenneth-Doherty
Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

array_repeat can panic for list inputs when it precomputes the repeated inner value count with len * count. A very large count can overflow that multiplication before DataFusion has a chance to return a normal execution error.

What changes are included in this PR?

  • Adds checked arithmetic while precomputing the list-path output sizes in array_repeat.
  • Validates the computed outer and inner offsets against the output offset type before allocating builders.
  • Uses fallible vector reservation for list-path capacity hints so capacity overflow becomes an execution error instead of a panic.
  • Adds a Rust unit regression and an SQL logic regression for array_repeat([1, 2, 3], 9223372036854775807).

Scope note: this intentionally targets the list-input path from #22219. The scalar element path is separate from this issue.

Are these changes tested?

Yes.

  • cargo test -p datafusion-functions-nested list_repeat_rejects_inner_count_overflow
  • cargo test -p datafusion-sqllogictest --test sqllogictests -- array/array_repeat.slt
  • cargo test -p datafusion-functions-nested
  • cargo fmt --check
  • cargo clippy -p datafusion-functions-nested --all-targets -- -D warnings
  • git diff --check

Are there any user-facing changes?

Yes. A malformed/oversized array_repeat query now returns a DataFusion execution error instead of panicking the process.

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 17, 2026
@Sean-Kenneth-Doherty
Copy link
Copy Markdown
Author

Fresh local validation on 95e3d6f08:

  • cargo test -p datafusion-functions-nested list_repeat_rejects_inner_count_overflow -> passed
  • cargo test -p datafusion-sqllogictest --test sqllogictests -- array/array_repeat.slt -> passed
  • cargo test -p datafusion-functions-nested -> 63 passed plus 2 doctests passed
  • cargo fmt --all --check -> passed
  • cargo clippy -p datafusion-functions-nested --all-targets -- -D warnings -> passed
  • git diff --check -> clean

The GitHub Process check is also green. This PR is scoped to the list-input overflow from #22219; the scalar array_repeat overflow variants remain separate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

panic: array_repeat list path overflows inner element count multiplication

1 participant